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Four hundred and fifty-eight genes coding for PentatricoPeptide Repeat (PPR) proteins are annotated in the Arabidopsis 
thaliana genome. Over the past 10 years, numerous reports have shown that many of these proteins function in 
organelles to target specific transcripts and are involved in post-transcriptional regulation. Therefore, they are thought 
to be important players in the coordination between nuclear and organelle genome expression. Only four of these 
proteins have been described to be addressed outside organelles, indicating that some PPRs could function in post- 
transcriptional regulations of nuclear genes. 

In this work, we updated and improved our current knowledge on the localization of PPR proteins of Arabidopsis with- 
in the plant cell. We particularly investigated the subcellular localization of 166 PPR proteins whose targeting predictions 
were ambiguous, using a combination of high-throughput cloning and microscopy. Through systematic localization ex- 
periments and data integration, we confirmed that PPR proteins are largely targeted to organelles and showed that dual 
targeting to both the mitochondria and plastid occurs more frequently than expected. These results allow us to specu- 
late that dual-targeted PPR proteins could be important for the fine coordination of gene expressions in both organelles. 



Introduction 

Plant nuclear genomes code for more than 99% of the 25000- 
30000 proteins required to build plant cells and tissues. 1 These 
proteins are addressed to various cell compartments to ensure 
specific cellular processes. Two other small genomes, formed by 
primary endo-symbiosis events, which led to the organelle forma- 
tion, are found in mitochondria and plastids. 2 Throughout evolu- 
tion, organelles have lost much of their original genomes by the 
transfer of genetic material to the nucleus. However, they have 
retained small genomes encoding key proteins and RNAs neces- 
sary for their biology. In Arabidopsis, 57 mitochondrial genes and 
128 chloroplast genes have been annotated on the corresponding 
genomes (TAIRvlO). The proteins encoded by these genes, act- 
ing together with nuclear imported proteins, play an important 
role in mitochondria and plastid functions. 3,4 Many of the pro- 
teins encoded by genes transferred from organelles to the nucleus 
are important for organelle gene expression or metabolism and 
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need to be targeted back to their original compartment. In addi- 
tion, many other nuclearly encoded proteins have acquired func- 
tions in different steps of organelle biology. Overall, more than 
3000 proteins encoded by the nuclear genome are predicted to be 
targeted to the organelles, 5 creating a requirement for a coordi- 
nated regulation of nuclear and organellar gene expression and a 
precise control of protein addressing and import into the organ- 
elles. Several import systems exist in mitochondria and plas- 
tids where translocation is mediated mainly by co-translational 
and post-translational machineries. The main machineries are 
well known. 6 " 8 They are named Translocase of the Outer/Inner 
Mitochondria membrane complexes (TOM/TIM) in mitochon- 
dria and Translocase of the Outer/Inner Chloroplast membrane 
complexes (TOC/TIC) in plastids. TOM/TIM and TOC/TIC 
account for the targeting of most organellar proteins. These two 
Translocase complexes share both similar structural conforma- 
tions and import mechanisms with the recognition of a Targeting 
Peptide (TP) and the involvement of chaperones, receptor, and 
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pore type proteins. 8 Despite these similarities, the mechanisms of 
translocation are specific to each Translocase. For example, the 
translocation into plastids requires GTP hydrolysis whereas it is 
not the case in mitochondria. 8 

Organelle physiological processes are under the control of pro- 
teins expressed from distinct genomes suggesting a tight and com- 
plex coordination in gene expression and, therefore, intracellular 
signaling pathways between cell compartments. Whereas nuclear 
genes are largely regulated at the transcriptional level, organelle 
genes are often constitutively expressed but tightly regulated at 
post-transcriptional levels. 9 Imported nuclear proteins are nec- 
essary for a wide range of organellar transcriptional and post- 
transcriptional processes, including RNA transcription, RNA 
processing, RNA editing, RNA splicing, and translation. Among 
these nuclear factors, the large family of PentatricoPeptide Repeat 
(PPR) proteins are emerging more and more as central actors of 
the inter-compartmental coordination of gene expression. 10 As 
expected for proteins involved in complex genome regulations, 
they define one of the largest families encoded by the nuclear 
genome with 458 members in Arabidopsis, 477 in rice, and up to 
800 in Selaginella moellendorffii. n ~ ii 

A typical PPR protein is constructed from a stretch (2-30) of 
35-amino acid motifs (known as PPR motifs) often merged in 
N terminus with a targeting peptide thought to allow an organ- 
elle subcellular localization. Several studies confirmed that the 
targeting peptide is functional, suggesting that PPR proteins are 
massively targeted to mitochondria or plastids. 10,11 Based on the 
PPR motif sequences and their relative serial organization, we 
proposed a classification of PPR proteins in two main subfami- 
lies. 11 In Arabidopsis, the largest one, named the P-type subfam- 
ily, contains 255 PPR proteins harboring tandem repeats of a 
simple canonical PPR motif (the P-type motif). The second one 
is known as the PLS-type subfamily and contains the remaining 
203 PPR proteins." Their module-based structures but also bio- 
chemical and genetic data indicate that PPR proteins are able to 
interact in a sequence-specific way with organelle RNAs to assure 
various post-transcriptional functions. 10,15 Recently, through 
computational and molecular biology approaches, a RNA rec- 
ognition code was proposed for PPR proteins where two adjacent 
PPR motifs are able to recognize one specific nucleotide. 16,17 The 
specificity of the base recognition is accomplished by the combi- 
nation of three amino acids, two located in the first PPR motif 
(third and sixth positions) and the third at the first position in the 
subsequent PPR motif. 16,17 

PPR proteins have largely been associated with tran- 
scriptional, post-transcriptional, and translational regula- 
tion of organellar expression. 10 A growing number of PPR 
proteins have been shown to be required for editing. For exam- 
ple, CHLORORESPIRATORY REDUCTION 4 (CRR4) 
is necessary for editing of the chloroplast ndhD transcript 18 
and MITOCHONDRIAL RNA EDITING FACTOR1 
(MEF1) is required for editing of three mitochondrial tran- 
scripts. 1 '' Arabidopsis PPR proteins are also involved in splic- 
ing of organelle transcripts: ORGANELLAR TRANSCRIPT 
PROCESSING43 (OPT43) and OTP51 are necessary for the 
correct trans-splicing of nadl and cis-splicing of ycf3 transcripts, 



respectively. 20,21 Finally, PPR proteins are involved in translation 
processes. For example, CHLOROPLAST RNA PROCESSING 
1 (CRP1) has been proposed to be a chloroplast translation regu- 
lator 22 and PPR336 is associated with mitochondrial polysomes. 23 
As expected with essential players in gene expression involved in 
respiration and photosynthesis, a large proportion of mutants in 
PPR genes are embryo or gametophyte lethal. 11,24,25 

Despite the growing PPR literature indicating that PPR pro- 
teins function mainly in organelles, some members could also 
have targets in the nucleus or the cytoplasm. In Arabidopsis, 
four PPR proteins were shown to be localized out of organelles. 
Two of them, PROTEINACEOUS RNase P 2 (PRORP2) and 
PRORP3, are localized exclusively in the nucleus where they 
are needed to achieve RNase P activity. 26 The two others have 
a more complex subcellular localization with a dual targeting 
to both mitochondria and nucleus. The GLUTAMINE-RICH 
PROTEIN23 (GRP23) interacts in nucleus with RNA poly- 
merase II but its nuclear and mitochondrial functions are not yet 
understood. 27 Similarly, Hammani and co-workers showed that 
PPR PROTEIN LOCALIZED TO THE NUCLEUS AND 
MITOCHONDRIA1 (PNM1) is involved in protein translation 
in mitochondria whereas it physically interacts with two pro- 
teins in the nucleus, NUCLEOSOME ASSEMBLY PROTEIN1 
and the transcription factor TCP8. 28 In animals, one example 
of a PPR protein localized out of the organelles has also been 
reported but its localization is still a matter of debate. This PPR 
protein, named BICOID STABILIZATION FACTOR (BSF) 
in Drosophila, as well as Leucine-Rich Repeat PentatricoPeptide 
Repeat Cassette (LRPPRC) in humans, was localized in the cyto- 
plasm and nucleus of early Drosophila embryo cells 25 with roles in 
transcription and RNA transport. Other authors showed the pro- 
tein to be localized in mitochondria where it would be involved in 
mRNAs maturation, poly-adenylation, and translation. 30 

Only a handful of PPR proteins were shown to function out of 
organelles. Many post-transcriptional processes are being shared 
by both the organelles and nucleus; therefore, this number may 
be underestimated. In order to identify new Arabidopsis PPR 
proteins addressed out of the organelles but also to improve our 
general knowledge on PPR targeting, we systematically investi- 
gated the subcellular localization of a third of the PPR family 
whose addressing prediction was ambiguous. We took advantage 
of a high-throughput cloning strategy combined with a transient 
expression system to elucidate whether the N terminus targeting 
peptides of candidate PPR proteins were functional to address 
the protein into organelles. We report in this work that, despite 
erroneous predictions of subcellular localization, most PPR pro- 
teins are addressed to one of the organelles and showed that a 
fraction of them, probably underestimated, are addressed to both 
mitochondria and plastids. 

Results 

Localization study of PPR proteins with ambiguous predic- 
tions of localization. We previously published a manually 
curated list of Arabidopsis PPR gene models. 12 When this work 
was initiated, the most accurate algorithms to predict subcellular 
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localization of plant proteins were TargetP vl.01 31 and Predotar 
vl.03. 32 Therefore, we used them to identify Arabidopsis genes 
coding for PPR proteins with ambiguous localization predictions 
(Table 1). TargetP was recently improved with the TargetP vl.l 
version of the software. Among the 458 PPR genes, Predotar pre- 
dicts that 244 and 92 PPR proteins are addressed, respectively, 
to mitochondria and plastids, whereas 122 PPRs would not have 
any organelle localization (Table 2). TargetP vl.l gives similar 
results with 232 and 123 PPRs localized to mitochondria and 
plastids, respectively, and 103 PPRs without organelle localiza- 
tions (Table 1). Taken together, 166 PPR proteins were predicted 
not to be addressed to either of the two main plant organelles 
by at least one of the two software (Predotar vl.03 and TargetP 
vl.01). Among them, 53 PPR proteins were not predicted to be 
addressed in the organelles by both algorithms. We chose to 
experimentally investigate the subcellular localization of those 
166 PPR proteins as they were good candidates to have atypical 
functions out of the organelles. 

Almost all the proteins addressed to organelles contain a tar- 
geting peptide in their N terminus extremity, which is cleaved 
during the transfer through the organelle membranes. 8 A mito- 
chondrial Targeting Peptide (mTP) is typically 40-50 amino acid 
long, 33 " 36 whereas a chloroplast Targeting Peptide (cTP) is usu- 
ally up to 60 amino acid long. 37 To assess the targeting peptide 
functionality, we systematically merged in frame the first 300 bp, 
coding for the first 100 amino acids of each candidate PPR ORFs 
to the Red Fluorescent Protein (RFP) coding sequence using the 
Gateway technology. The aim of this approach was to experimen- 
tally detect any mTP or cTP present in the first 100 amino acids 
but not recognized by the prediction software. Vector cloning 
based on Gateway recombination technology was successful for 
162 genes (97%). After agro-infiltration of Nicotiana benthama- 
nia leaves with these constructs and subsequent protoplasts prep- 
aration, we were able to detect RFP signals for 131 constructs 
(79%) using either epifluorescent or confocal microscope. All 
localization experiments were repeated at least three times and 
observed independently by two of the authors. Table 1 summa- 
rizes all predicted and experimental data obtained during this 
study. Presented in Figure 1A are examples of typical subcellu- 
lar localizations observed using 300 bp constructs. In Figure 1, 
RFP fluorescence was visualized using a confocal microscope and 
compared with the distribution of the mitochondrion-specific 
probe MitoTracker Green and the chlorophyll autofluorescence. 
In the overlay panels, combined fluorescence from RFP (in red), 
MitoTracker (in green), and chlorophyll autofluorecence (in blue) 
appears in yellow when RFP signal co-localizes with MitoTracker 
staining indicating a localization of the fusion protein in mito- 
chondria whereas it appears in violet when RFP signal is localized 
in plastids. It was detected that 68 and 31 300 bp-PPR con- 
structs gave an exclusive mitochondrial and plastid localization, 
respectively, as exemplified by AT3G15130 and AT3G46610 in 
Figure 1A. Interestingly, 24 constructs exhibited a signal in both 
organelles (see for example, AT2G36240 and AT5G47460 in 
Fig. 1A) and nine constructs gave localizations out of organelles, 
appearing as typical nuclear and cytosolic signals (AT1G06150 
and AT1G06580 in Fig. 1A). These localization results in the 



nucleus and the cytosol of the protoplasts suggest that the first 
100 amino acids of these proteins do not code for a functional 
peptide targeting to organelles and that the RFP fusion proteins 
are localized where the translation occurs (in the cytosol) and in 
the nucleus by passive diffusion of small proteins through nuclear 
pores. 

As addressing signals could be outside the first 100 amino 
acids and because using the first 100 amino acids may induce 
addressing artifacts, we decided to investigate in more detail the 
subcellular localization of the 33 PPR proteins that did not show 
a simple single organellar localization. Out of the 24 PPR pro- 
teins localized in both organelles and the nine proteins appear- 
ing outside of the organelles, we successfully cloned the whole 
ORFs and created RFP fusions for 19. Subcellular localizations 
of these fusions were monitored using Agro-infiltrated N. ben- 
thamiana-derived protoplasts observed under epifluorescent and 
confocal microscope (examples in Fig. IB). Results are summa- 
rized in Table 1. We confirmed the dual subcellular localization 
for six out of the 11 ORFs successfully expressed and encoding 
full-length proteins thought to be addressed in both organelles 
(see AT5G47460 in Fig. IB for example). As for AT2G36240 
(Fig. IB), we showed a single localization in mitochondria for 
the other five. Among the nine PPR proteins thought to be out 
of the organelles on the base of the first 100 amino acids, seven 
ORFs were successfully cloned but no cytosolic localization 
was confirmed: the whole proteins fused to RFP were system- 
atically addressed to one or both organelles, as exemplified by 
AT1G06150 and AT1G06580 in Figure IB. Surprisingly, six of 
them were localized in both organelles (AT1G06150 in Fig. IB). 
Altogether, 12 PPRs proteins were verified as being localized in 
both mitochondria and plastids using the full-length protein. 

Integrative overview of the subcellular localizations of PPR 
protein family. In order to provide a general overview of the 
localization of the whole PPR protein family, we aggregated our 
results concerning one-third of the family, with all available data 
from published studies and accessible databases (Table 2). 

We first re-examined the localization prediction of the 458 
PPR proteins encoded in the Arabidopsis genome using six 
available bio-informatics prediction tools: Predotar, 32 TargetP, 31 
iPSORT, 38 Loctree, 39 Multiloc, 40 and AtSubP. 41 Despite using 
distinct algorithms, those tools largely provide similar results and 
Table 2 gives a single localization prediction aggregating the six 
software results following a rule emphasized in the caption of 
Table 2. Overall, 65% and 17% of the PPR proteins are predicted 
to function in mitochondria and in plastids, respectively. For 
18%, the results are unclear either because a majority of the soft- 
ware was unable to provide an organellar prediction or because 
they provide overmuch diverging organellar predictions. 

We also added the growing data coming from the proteomics 
identification of organelle proteins in Arabidopsis mitochondria 
(SUBA3 5 ) and chloroplast (SUBA3, 5 AT_Chloro, 42 PPDB 43 ), 
also including localization data obtained from maize chloroplast 
(PPDB 43 ) and rice mitochondria, 44 according to the recent concept 
of orthoproteomics. 45 As published, 12 a very good level of orthol- 
ogy observed between the members of PPR families in A. thali- 
ana and Oriza sativa suggests that both their function and their 



www.landesbioscience.com 



RNA Biology 



1559 



Table 



1. Subcellular localization study of 166 PPR proteins with ambiguous prediction data 





Gene model 


Prediction 


Fluorescent signal 




PPR 


TAIRvIO O'Toole 


Target P 


Predotar 


Targeting Peptide 


FL Protein 


Conclusion 


At1g01970 


AT1G01970.1 


M 


none 


M 


C 


C 


A 14 *-AO J Ofl 

All g 02420 


A T -1 ^A<1 A*~lf\ A 

AT1G02420.1 


ER 


M 


M/C 


c.u. 


m/c 


All g 04840 


A T 4 Oft A Q Al\ A 

AT1G04840.1 


none 


M 


C 


n.a. 


C 


At 1g 05670 


AtPPR_1 g05670 


M 


none 


N/Ct 


M/C 


M/C 


At1g06150 


AtPPR_1g06150 


C 


none 


N/Ct 


M/C 


M/C 


At 1g 06 580 


AT1G06580.1 


M 


ER 


N/Ct + M 


M/C 


M/C 


At1g08610 


AT1G08610.1 


none 


C 


no signal 


n.a. 




At1g09190 


ATA OAAi AA A 

AT1GQ9190.1 


M 


none 


M 


n.a. 


M 


At1g09410 


AT1G09410.1 


M 


M 


no signal 


n.a. 


pM 


At 1g 09900 


AT A O AAAArt 4 

AT1G09900.1 


none 


C 


M 


n.a. 


M 


At1g 10270 


AT1G10270.1 


M 


M 


M 


n.a. 


M 


At1g 10330 


AT1G1 0330.1 


ER 


none 


no signal 


n.a. 


" 


At1g1 1290 


AT1G1 1290.1 


C 


none 


C 


n.a. 


C 


At1g 14470 


AT1G 14470.1 


ER 


none 


C 


n.a. 


c 


At1g 15480 


AtPPR_1g15480 


M 


ER 


no signal 


n.a. 


" 


At1g 18485 


AT1G1 8485.1 


C 


ER 


N/Ct 


c.u. 




At1g 19290 


AT1G1 9290.1 


M 


ER 


M 


n.a. 


M 


At1g 19720 


AT1G19720.1 


none 


none 


M 


n.a. 


M 


At1g20230 


AT1G20230.1 


none 


none 


M 


n.a. 


M 


At1g22830 


AT1G22830.1 


none 


M 


M 


n.a. 


M 


At1g25360 


AT1G25360.1 


M 


none 


M/C 


M/C 


M/C 


At1g31430 


AT1G31430.1 


ER 


none 


M 


n.a. 


M 


a 1 4 ~ o a 7An 

At1g31790 


KT A O A ~tC\t\ 4 

AT1G31790.1 


C 


none 


C 


n.a. 


C 


At1g31840 


AtPPR_1g31840 


ER 


none 


M 


n.a. 


M 


At1g31920 


a ~r h n Ann 4 

AT1G31920.1 


none 


none 


no signal 


n.a. 




At 1g 33 350 


AT1G33350.1 


none 


C 


M 


n.a. 


M 


At 1g 50270 


AT1G50270.1 


none 


none 


M 


n.a. 


M 


At1g53330 


AT1G53330.1 


M 


none 


c.u. 


n.a. 


" 


At 1g 56 570 


AT1G56570.1 


C 


none 


M 


n.a. 


M 


At 1g 59720 


AT1G59720.1 


C 


none 


C 


n.a. 


C 


At 1g 60 770 


AT1G60770.1 


none 


M 


M 


n.a. 


M 


At 1g 62260 


AT1G62260.1 


ER 


M 


M 


n.a. 


M 


At 1g 62590 


AT1G62590.1 


M 


none 


M/C 


M 


M 


At1g63330 


AT1G63330.1 


none 


none 


M 


n.a. 


M 


At 1g 63400 


AT1G63400.1 


M 


none 


M/C 


M 


M 


At1g64100 


AtPPR_1g64100 


C 


none 


M/C 


c.u. 


m/c 


At 1g 68930 


KT A OAAAAA 4 

AT1G68930.1 


M 


none 


M 


n.a. 


M 


At 1g 69290 


AtPPR_1 g69290 


ER 


ER 


M 


n.a. 


M 


At1g71490 


AT1G71490.1 


none 


C 


M 


n.a. 


M 


At1g73710 


AT1G73710.1 


M 


none 


C 


n.a. 


C 


At 1g 74400 


AT1G74400.1 


none 


M 


M/C 


M 


M 


At1g74580 


AT1G74580.1 


none 


none 


no signal 


n.a. 




At 1g 74630 


KT A -7iA1A J 

AT1G74630.1 


ER 


C 


no signal 


n.a. 




At1g76280 


AtPPR_1g76280 


M 


ER 


M 


n.a. 


M 


At2g01360 


AtPPR_2g01360 


ER 


ER 


no signal 


n.a. 




AIOhAH ~7 AC\ 

At2g01 740 


AT2G01 740.1 


none 


M 


M 


n.a. 


M 


At2g02750 


AT2G02750.1 


M 


M 


no signal 


n.a. 


pM 


At2g 04860 


AT2G04860.1 


none 


M 


M 


n.a. 


M 


At2g06000 


AT2G06000.1 


none 


M 


C 


n.a. 


C 


At2g 1 3600 


AT2G13600.1 


none 


none 


M 


n.a. 


M 


At2g 1 5820 


AT2G15820.1 


C 


none 


C 


n.a. 


C 


A|n„i coon 
At2g 15980 


ATOOHCftOn -1 

AT2G 15980.1 


none 


none 


no signal 


n.a. 




At2g 16880 


AT2G16880.1 


none 


none 


no signal 


n.a. 




At2g 20540 


ATOOOf\Cj<f"\ -1 

AT2G20540.1 


none 


none 


M 


n.a. 


M 


At2g21090 


AT2G21090.1 


M 


none 


M 


n.a. 


M 


At2g22070 


AT2G22070.1 


none 


C 


M 


n.a. 


M 


At2g26790 


AT2G26790.1 


M 


ER 


M 


n.a. 


M 


At2g27610 


AT2G27610.1 


M 


none 


M 


n.a. 


M 


At2g28050 


ATlAnoncn 4 

AT2G28050.1 


none 


M 


M 


n.a. 


M 


At2g 32630 


A TO /— * 4 

AT2G32630.1 


M 


none 


M 


n.a. 


M 


At2g33680 


AT2G33680.1 


none 


none 


c.u. 


n.a. 




At2g337o0 


AT2G33760.1 


none 


C 


C 


n.a. 


C 


At2g34400 


AT2G34400.1 


none 


M 


M/C 


c.u. 


m/c 


At2g35130 


AtPPR_2g35130 


ER 


none 


C 


n.a. 


C 


At2g36240 


AT2G36240.1 


none 


none 


M/C 


M 


M 


At2g36730 


AT2G36730 1 


none 


none 


no signal 


n.a. 




At2g37230 


AT2G37230.1 


M 


none 


M 


n.a. 


M 


At2g37310 


AT2G37310.1 


none 


none 


C 


n.a. 


C 


At2g39620 


AtPPR_2g39620 


M 


ER 


C 


n.a. 


C 


At2g40720 


AT2G40720.1 


none 


none 


M 


n.a. 


M 


At2g41080 


AtPPR 2g41080 


C 


none 


M/C 


no signal 


m/c 
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Table 1. Subcellular localization study of 166 PPR proteins with ambiguous prediction data (continued) 





Gene model 


Prediction 


Fluorescent signal 




PPR 


TAIRvIO O'Toole 


Target P 


Predotar 


Targeting Peptide 


FL Protein 


Conclusion 


At2g41720 


AT2G41 720.1 


none 


C 


C 


n.a. 


C 


At2g44880 


AT2G44880. 1 


ER 


none 


N/Ct 


M/C 


M/C 


At2g45350 


AT2G45350.1 


none 


C 


c.u. 


n.a. 




At3g01580 


AT3G01580.1 


M 


none 


M 


n.a. 


M 


At3g0201 0 


AT3G02010.1 


M 


none 


M 


n.a. 


M 


At3g 05240 


AT3G 05240.1 


none 


M 


no signal 


n.a. 


" 


At3g 06920 


AT3G 06920.1 


none 


none 


M 


n.a. 


M 


At3g08820 


AT3G08820.1 


none 


C 


M/C 


M 


M 


At3g 09060 


AT3G 09060.1 


none 


M 


M 


n.a. 


M 


At3g 09650 


AT3G 09650.1 


C 


none 


C 


n.a. 


C 


At3g 12770 


AT3G 12770.1 


none 


none 


M 


n.a. 


M 


At3g 14330 


AT3G 14330.1 


none 


M 


M 


n.a. 


M 


At3g15130 


AT3G15130.1 


none 


M 


M 


n.a. 


M 


At3g 1 5930 


AT3G 15930.1 


none 


M 


M 


n.a. 


M 


At3g 16610 


AT3G16610.1 


M 


none 


M 


n.a. 


M 


At3g 1 8840 


AT3G18840.2 


C 


none 


N/Ct 


c.u. 




At3g20730 


AtPPR_3g20730 


ER 


ER 


M 


n.a. 


M 


At3g21470 


AT3G2 1470.1 


none 


none 


M/C 


c.u. 


m/c 


At3g 23020 


AT3G2 3020.1 


none 


C 


c.u. 


n.a. 


~ 


At3g23330 


AT3G23330.1 


none 


M 


M/C 


c.u. 


m/c 


At 3g 25970 


AT3G25970.1 


ER 


none 


M 


n.a. 


M 


At3g 26540 


AT3G26540.1 


none 


C 


M 


n.a. 


M 


At3g 28640 


AT3G28640. 1 


M 


none 


no signal 


n.a. 


- 


At 3g 28660 


AT3G28660.1 


M 


none 


no signal 


n.a. 




At3g29290 


AtPPR_3g29290 


M 


none 


no signal 


n.a. 




At3g42630 


AT3G42630.1 


none 


M 


C 


n.a. 


C 


At3g4661 0 


AT3G46610.1 


none 


ER 


C 


n.a. 


C 


At3g46790 


AT3G46790.1 


C 


none 


C 


n.a. 


c 


At3g47530 


AT3G47530. 1 


none 


none 


M/C 


M/C 


M/C 


At3g47840 


AT3G47840. 1 


M 


none 


C 


n.a. 


C 


At3g48810 


AT3G48810.1 


ER 


none 


M 


n.a. 


M 


At3g49240 


AT3G49240.1 


M 


none 


M/C 


M/C 


M/C 


At3g49710 


AT3G49710.1 


none 


none 


N/Ct 


M/C 


M/C 


At3g49740 


AT3G49740. 1 


C 


none 


M 


n.a. 


M 


At3g 50420 


AT3G 50420.1 


none 


none 


M/C 


M/C 


M/C 


At3g53170 


AtPPR_3g53170 


none 


none 


C 


n.a. 


C 


At 3g 56550 


AT3G 56550.1 


none 


C 


no signal 


n.a. 


" 


At3g 57430 


AT3G57430.1 


C 


ER 


C 


n.a. 


C 


At3g 58590 


AT3G 58590.1 


none 


none 


M 


n.a. 


M 


At3g62890 


AtPPR_3g62890 


C 


none 


N/Ct 


M/C 


M/C 


At4g01570 


AT4G01570.1 


C 


none 


C 


n.a. 


C 


At4g02750 


AT4G 02750.1 


M 


none 


M 


n.a. 


M 


At4g 04370 


AT4G 04370.1 


none 


C 


M/C 


c.u. 


m/c 


At4g08210 


AT4G08210.1 


none 


none 


M 


n.a. 


M 


At4g 1 1 690 


AT4G1 1690.1 


ER 


ER 


M 


n.a. 


M 


At4g 1 3650 


AT4G 13650.1 


none 


M 


no signal 


n.a. 


pM 


At4g 14820 


AT4G14820.1 


none 


C 


M 


n.a. 


M 


At4g 14850 


AT4G 14850.1 


C 


none 


M/C 


c.u. 


m/c 


At4g 15720 


AT4G 15720.1 


none 


none 


C 


n.a. 


C 


At4g 1 6470 


AT4G 16470.1 


ER 


M 


M 


n.a. 


M 


At4g 1 8840 


AtPPR_4g18840 


none 


C 


N/Ct 


C 


C 


At4g 20090 


AT4G20090.1 


C 


ER 


M 


n.a. 


M 


At4g 20740 


AT4G20740.1 


C 


none 


C 


n.a. 


C 


At4g21065 


AT4G2 1065.1 


ER 


none 


no signal 


n.a. 




At4g21880 


AT4G2 1880.1 


ER 


M 


M 


n.a. 


M 


At4g22760 


AT4G22760. 1 


ER 


M 


M 


n.a. 


M 


At4g2801 0 


AT4G28010.1 


M 


none 


M 


n.a. 


M 


At4g 30700 


AT4G 30700.1 


none 


M 


M/C 


M/C 


M/C 


At4g33170 


AT4G33170.1 


C 


none 


M/C 


c.u. 


m/c 


At4g37170 


AT4G37170.1 


M 


none 


M 


n.a. 


M 


At4g3801 0 


AT4G38010.1 


none 


none 


no signal 


n.a. 




At5g 03800 


AT5G03800.1 




none 


M/C 


c.u. 


m/c 


AfinflZfiin 

Miogu*fo i u 


M I OoU4o I U. I 


c 


none 


C 


n.a. 




At5g 06540 


AT5G06540.1 


ER 


M 


c 


n.a. 


C 


At5g08310 


AtPPR_5g08310 


none 


M 


no signal 


n.a. 




At5g08490 


AT5G08490.1 


M 


none 


no signal 


n.a. 




At5g08510 


AT5G08510.1 


none 


none 


M 


n.a. 


M 


At5g 10690 


AT5G10690.1 


C 


ER 


C 


n.a. 


C 


At5g 14080 


AtPPR_5g14080 


M 


none 


M/C 


c.u. 


m/c 


At5g 15300 


AT5G 15300.1 


none 


none 


no signal 


n.a. 




At5g 16860 


AT5G 16860.1 


none 


M 


no signal 


n.a. 
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Table 1. Subcellular localization study of 166 PPR proteins with ambiguous prediction data (continued) 



Gene model Prediction Fluorescent signal 



PPR 


1 AIK V1U 


O'Toole 


Target P 


P red ota r 


Targeting Peptide 


FL Protein 


Conclusion 


AlCn 4 OA 7C 

Aiog i o*t I o 


ATCO'10(l7i; 1 

A I Dol 1 




C 


none 


M 


n.a. 


M 


Aiog loyou 


A I ool oyou. l 




M 


none 


M 


n.a. 


M 




A 1 Dbz Iz/z. 1 




none 


none 


M/C 


c.u. 


m/c 


AlOg^ODOU 




A4DDO CnlCC^H 

AirrK_E)gzODoU 


none 


none 


no signal 


n.a. 




ATogz/ ^fU 


A I OOil f/L/U. I 




C 


none 


C 


n.a. 


L. 


AtCn 17C7fl 
ATog OlO/u 


AT^r > '57^7n 1 
A I ObO/D/U. I 




none 


rvi 


no signal 


n.a. 




Aiogoo/ou 


A I OOOO/JU. I 




none 




no signal 


n.a. 




ATogoyoou 


A I OooirooU.l 




none 


none 


M 


n.a. 


M 


A tCn A f\A nc 

Aiog*nj*vuo 


A I Du4U4UD.1 




none 


none 


M 


n.a. 


M 


Aiog^o/yu 


A I oo^o/yu.i 




none 


none 


M/C 


c.u. 


m/c 


AIOg*K) 1 uu 


A I 30HD1UU.1 




none 


u 

M 


no signal 


n.a. 




ATog4DDOU 


ATKP/fifiQn 1 
A 1 Oo*tODoU.l 




none 


M 


M 


n.a. 


u 

M 


At^n>l7jlKn 

ATog4 / *toU 


A I Oo4/*tOU.l 




u 


none 


M/C 






Aiog*joy i u 








none 


r> 
L< 


n.a. 


r* 
L/ 


At5g 50990 




AtPPR 5g50990 






no signal 






At5g 52630 


AT5G52630.1 




none 


none 


M 


n.a. 


M 


At5g55840 




AtPPR_5g55840 


none 


none 


M 


n.a. 


M 


At5g56310 


AT5G56310.1 




M 


none 


M 


n.a. 


M 


At5g 59600 


AT5G59600.1 




none 


none 


C 


n.a. 


C 


At5g 59900 


AT5G59900.1 




M 


none 


M 


n.a. 


M 


At5g65570 


AT5G65570.1 




M 


none 


no signal 


n.a. 




At5g65820 


AT5G65820.1 




M 


none 


M 


n.a. 


M 


At5g 66520 


AT5G66520.1 




none 


none 


C 


n.a. 


c 


At5g67570 




AtPPR_5g67570 


none 


none 


no signal 


n.a. 





Manually curated Arabidopsis PPR gene models were used. 12 Most of them are identical to TAIR v10 gene models but 22 models are different and are 
indicated with their AtPPR codes. Predictions of localization using Predotar v1.03 and Target P vl.1 software are listed. Experimental fluorescent signals 
observed in protoplasts expressing Targeting Peptide or Full-Length (FL) protein fused to RFP are shown. Two independent observations by two of 
the authors were done on at least three independent agro-infiltrations. For each PPR, a tentative conclusion is proposed with the following rules: (1) if 
available, the observation of FL-protein fusion is considered as the true localization, (2) if a mitochondrial or a chloroplastic localization was observed 
for the targeting peptide and no observation was recorded for the full-length protein, the result of TP is indicated as conclusion, (3) if a dual localization 
was observed and no observation was obtained with the full-length protein, the result of TP is indicated as probable in lowercase, (4) if no experimental 
observation was obtained, the predicted localization is indicated with a preceding "p"., . M, mitochondria; C, chloroplasts; N/Ct, nucleus and cytoplasm; 
M/C, dual localization in mitochondria and chloroplasts; pM, predicted in mitochondria (conclusion column); pC, predicted in chloroplasts (conclusion 
column); m, probably in mitochondria (conclusion column); c, probably in chloroplasts (conclusion column); m/c, probably in mitochondria and chloro- 
plasts (conclusion column); -, no conclusion; c.u., cloning unsuccessful; n.a., not attempted. 



subcellular localization are largely conserved between species even 
between monocotyledons and dicotyledons. Overall, 83 (about 
18%) of the Arabidopsis PPR proteins, or PPR orthologs in other 
species, were identified either in the plastidial or the mitochondrial 
proteomes, providing a very useful set of PPR protein localiza- 
tion data (Table 2). Three and five PPR proteins were identified 
during proteomics characterization of Arabidopsis nuclear and 
cytosolic proteins, respectively. 46 " 49 Surprisingly, 28 PPR proteins 
were characterized in plasma membrane or vacuole extracts. 50 " 56 
Without functional characterization of any of these membrane 
PPR proteins, these observations cannot be solved. They may be 
due to intrinsic technical limitations of proteomics approaches; in 
contrast, their number may indicate unsuspected localizations and 
functions. However, proteome-based localizations validate many 
of the prediction results of bio-informatics software as 48 (71%) of 
them matched the available predictions (Table 3). 

A growing number of PPR proteins were subjected to in planta 
functional characterization either in dedicated studies (see ref- 
erences in Table 2) or in systematic studies including the work 
reported here and three previous ones 11,57 unpublished data in 
SUBA3 (Table 2). Authors usually characterized localizations by 
microscopy using fusions between PPR proteins or, if suspected, 
putative targeting peptides and a fluorescent reporter. Including 



the work reported here, 208 PPR localizations were experimen- 
tally determined using fluorescent fusion proteins, largely cor- 
relating with both bio-informatics and proteomics approaches 
(Table 2 and 3). Among the 159 PPRs proteins for which both 
experimental localization data based on protein fusion and pre- 
dictions using bio-informatics tools are available, 135 (85%) have 
a similar localization. In addition, among the 36 PPR proteins 
being both identified in sub-proteomes and subjected to experi- 
mental localization studies using fluorescent protein fusion, 30 
(83%) were compatible. The last set of data comes from the iden- 
tification of the molecular functions of PPR proteins using reverse 
genetics, providing very important data about their localization 
(Table 2). As largely reported in the literature, PPR proteins are 
involved in regulating gene expression by acting through direct 
interaction with specific RNAs. A literature survey indicates 
that molecular roles were assigned to 68 PPR proteins (Table 2), 
occurring in plastids (31), in mitochondria (34), in both mito- 
chondria and plastids (1), or in the nucleus (2). These reverse 
genetics studies are very strong statements of PPR localization, 
which could be considered as true localization. When compared 
with this very high quality data set, our data as well as all data of 
fluorescent protein localization appeared as very highly correlated 
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Table 2. Prediction and experimental localization data of Arabidopsis thaliana PPR proteins 











Localization 












AGI 


Gene Annotation' 


Domains 

2 


Predictions 


Proteomics 
4 


5 

xpenmen 


Conclusion 
6 


EMB 


Molecular Function 
(localization) 8 


References 


At1g01970 


PPR containing protein 


P 


M 






c 






"this report 


At1g02060 


PPR containing protein 


P 


M 






pM 








At1g02150 


PPR containing protein 


HfiH 


M 


C (At , t,c , Zm*) 




C 






'AT_Chloro, "Kong etal 2011, C PPDB 


At1g02370 


PPR containing protein 


p 


M 


M (At") 




M 






"Klodmann et al 2011 


At1g02420 


PPR containing protein 


p 


M 




m/c" 


m/c 






"this report 


At1g03100 


PPR containing protein 


p 


M 


Ct (At") 




pM 






"Hummel et al 2012 


At1g03510 


PPR containing protein 


PLS-E 


C 






pC 








At1g03540 


PPR containing protein 


PLS-E 


M 






pM 








At1g03560 


PPR containing protein 


HHHI 


M 






pM 








At 1g 04840 


PPR containing protein 


PLS-E-DYW 




PM (At ) 


c 








Mitra etal 2009, this report 


At1g05600 


EMB3101 










pM 


confirmed 8 




"SeedGenes 


At1g05670 


PPR containing protein 


P 


M 




M/C" 


m/c 






"this report 


At1g05750 


CLB19/PDE247 


PLS-E 


c 




C** 


C 




Editing rpoA and dpP (C") 


"Chateigner-Boutin et al 2008, 
"in house SUBA3 


At1g06140 


MEF3 


PLS-E 


M 






M 




Editing arp4 (M fl ) 


Verbistkiy et al 2012 


At1g06150 


EMB1444 


PLS-E 


C 




M/C" 


m/c 


potential 11 




"this report, "Cushing et al 2005 


At1g06270 


PPR containing protein 


P 


M 




ER/C" 








"Narsai et al 2011 


At1g06580 


PPR containing protein 


HI 


M 




M/C" 


m/c 






"this report 


mi i guo/ i u 






m 




M 






Processing and stability of nad4 
<M") 


Hai'li et al 2013 


Attg07590 


PPR containing protein 


HUH 


H 






pM 








At1g07740 


PPR containing protein 


p 


M 






pM 








At1g08070 


OTP82 


PLS-E-DYW 


C 




C" 


C 




Editing ndhG, ndhB (C 6c ) 


"in house SUBA3, "Hammani et al 2009, 
'Okuda et al 2010 


At1g08610 


PPR containing protein 


P 


c 






PC 








Ai1g09190 


PPR containing protein 


PLS-E 






M* 


m 






"this report 


At1g09220 


PPR containing protein 


PLS-E 


M 






pM 








Al1g09410 


PPR containing protein 


PLS-E-DYW 


M 






c 








At1g09680 


PPR containing protein 


P 


M 




M" 


M 






"Narsai et al 2011 


At1g09820 


PPR containing protein 


P 


M 






pM 








At1g09900 


PPR containing protein 


P 


c 


C(Zm") 


M b 


m/c 






"PPDB, "this report 


mi i g i \jz i u 










N , M" 


M/N 


confirmed 




"Ding et al 2006, "Narsai et al 201 1 , 
"ihis report, "SeedGenes 


AMg10330 


PPR containing protein 


PLS-E 


m 






pM 








At1g10910 


EMB3103 


P 


M 


C (Zm") 




c 


confirmed" 




"PPDB, "SeedGenes 


At1g11290 


CRR22 


PLS-E-DYW 








c 




Editing tiono, ndhu, rpoB (C ) 


"this report, "in house SUBA3, 
Okuda et al 2009 


At1g11S30 


PPR containing protein 


P 


M 


M (At") 




M 






"Heazlewood et al 2004 


AI1g11710 


PPR containing protein 


P 


M 






pM 








Atlgl 1900 


PPR containing protein 


'P 


M 






pM 








At1g12300 


PPR containing protein 


P 


M 






pM 








At1g 12620 


PPR containing protein 


P 


M 






pM 








At1g12700 


RPF1 


P 


M 




M" 


M 




Processing nad4 transcript (M fl ) 


"Holze et al 2011 


At1g12775 


EMB1586 


Hlfll 


M 






pM 


confirmed" 




"SeedGenes 


At1g13040 


PPR containing protein 


p 


M 


V(Ar*) 




pM 






"Jaquinod et al 2007 


At1g13410 


PPR containing protein 


PLS-E 


m 






pM 








At1g13630.1 


PPR containing protein 


P 








- 








At1g13800 


FAC19 




M 






pM 


confirmed" 




"Yu etal J 2011 


At1g14470 


PPR containing protein 


PLS 


m 




C" 


c 






"this report 


At1g15480 


PPR containing protein 


P 


m 


M (At") PM (At 0 ) 




M 






"Klodmann et al 201 1 , "Mitra et al 2009 


AI1g15510 


AIECB2 /VAC1 


PLS-E-DYW 


M 






C 




Editing accD and ndhF (C 6c ) 


"in house SUBA3. "Yu et al 2009, 
c Tsengetal 2010 


All r» 1 £1 Jt DA 

Miigio^ou 


pseud ogene 


KLO-t-UY W 


M 






pM 








At1g 16830 


PPR containing protein 


p 


M 






pM 








At1g 17630 


PPR containing protein 


PLS-E 


M 






pM 








At1g 18485 


PPR containing protein 


PLS-E-DYW 


C 






PC 








AM g 1 8900 


PPR containing protein 


P-D 


M 






pM 








At1g 19290 


PPR containing protein 


P 


m 




M 


M 






this report, in house SUBA3 


At1g19520 


NFD5 


P 


M 


PM (At") 




pM 


potential" 




"Zhang et al 201 1 , "Portereiko et al 2006 


At1g19720 


PPR containing protein 


PLS-E-DYW 




C (At" 6 , Zm") 


M* 


m/c 






"Kong et al 201 1 , "PPDB, 'this report 


At1g20230 


PPR containing protein 


PLS-E-DYW 






M" 


m 






"this report 


At1g20300 


PPR containing protein 


P 


M 




M" 


M 






"Narsai etal 2011 


At1g22830 


PPR containing protein 


PLS-E 


M 




M* 


M 






"this report 


At1g22960 


PPR containing protein 


P 


M 






pM 








At1g25360 


PPR containing protein 


PLS-E-DYW 


M 




M/C* 


m/c 






"this report 


At1g26460 


PPR containing protein 


p 


M 


M (At* 6 . Os c ) 
PM (At") 




M 






"Heazlewood et 2004, "Klodmann et al 201 1 , 
c Huang et al 2009, "Zhang et al 201 1 


At1g26500 


PPR containing protein 


P 


M 






pM 








At1g26900 


PPR containing protein 


PLS-E 


M 






pM 








At1g28020 


PPR containing protein 


■Km 


M 






pM 








At1g28690 


PPR containing protein 


PLS-E 


M 






pM 








At1g29710 


PPR containing protein 


PLS-E-DYW 


M 






pM 








At1g30290 


pseudogene 


P 


M 






pM 








At1g30610 


EMB2279 


■w 


C 


C(Zm") 




C 


confirmed " 




"PPDB, "SeedGenes 


At1g31430 


PPR containing protein 


PLS-E 


c 




M* 


m 






this report 


At1g31790 


PPR containing protein 


PLS 


c 




C" 


C 






"this report 


At1g31840 


PPR containing protein 


P 






M" 


m 






"this report 


At1g31920 


PPR containing protein 


PLS-E-DYW 




C (Zm a ) 




c 






"PPDB 


At1g32415 


PPR containing protein 


PLS-E 


M 






pM 








At1g33350 


PPR containing protein 


PLS-E 


m 




M' 


M 






"this report 


At1g34160 


OGR1 


PLS-E-DYW 


M 




M* 


M 




Editing nad4. nad2. ccmC, 
cox2, cox3 (M") 


"Kim et al 2009 


At1g43010 


PPR containing protein 




M 






pM 








At1g43980 


PPR containing protein 


PLS-E 


M 






pM 








At1g47580 


DYW1 


PLS-E-DYW 


c 






C 




Editing ndhD (C) 


"Boussardon et al 2012. "in house SUBA3 


At1g50270 


PPR containing protein 


PLS-E 


M 




M* 


M 






"this report 


At1g51965 


AB05 


HUH 


M 


C(At") 


M° 


M 




Splicing nad2 intron3 (M 6 ) 


"AT_Chloro,"Liu et al 2010 


At1g52620 


PPR containing protein 


p 


M 


C(At*) 




c 






"PPDB 


At1g52640 


PPR containing protein 


p 


M 






pM 








At1g53330 


CBJ265 


p 


M 






pM 


confirmed" 




"Kocabek et al 2006 


At1g53600 


PPR containing protein 


PLS-E 


M 






pM 








At1g55630 


PPR containing protein 


P 


M 






pM 








At1g55890 


PPR containing protein 


P 


M 


M (At**) PM (Af ) 




M 






^Heazlewood et 2004. "Klodmann et al 201 1 , 
Mitra et al 2009 


At1g56570 


PGN 


PLS-E 






M" 6 


M 






"Laluketal 2011. this report 
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Table 2. Prediction and experimental localization data of Arabidopsis thaliana PPR proteins (continued) 











Localization 












AGI 


Gene Annotation 1 


Domains 


Predictions 


Proteomics 


Experimental 5 


Conclusion 


EMB 7 


Molecular Function 
(localization) 8 


References 


At1g56690 


PPR containing protein 


PLS-E-DYW 


M 






pM 








AMg59720 


CRR28 


PLS-E-DYW 


c 




M",C bc 


C 




Editing ndhB, ndhD (C°) 


"Lurin et al 2004, "in house SUBA3, 
'this report, "Okuda et al 2009 


AMg6Q770 


PPR containing protein 


P 


m 


M (At**, Os c ) 
PM (At") 


M" 


M 






"Heazlewood et 2004, "Klodmann et al 20 1 1 , 
c Huang et al 2009, d Mitra et al 2009. "this report 


At1g61870 


PPR336 


P 


M 


M (At" * c ) 


M a 


M 






"Heazlewood et al 2004. Uyttewaal 2007, 
"Kiocfmann et al 20 1 1 , "Lurin et al 2004 


At1g62260 


MEF9 


PLS-E 






M" 


M 




Editing nad? (M b ) 


"this report, "Takenaka et al 2010 


At1g62350 


THA8-UKE3 


P-D 


M 






pM 
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Table 2. Prediction and experimental localization data of Arabidopsis thaliana PPR proteins (continued) 
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. Prediction and experimental localization data of Arabidopsis thaliana PPR proteins (continued) 
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pM 








At3g57430 


OTP84 


PLS-E-DYW 


c 


PM (At ) 




_ 




Editing psbZ, ndhB, ndhF {C ) 


"Li et al 2012, "this report, c in house SUBA3, 
"Hammani et al 2009 


At3g58590 


PPR containing protein 


P 


m 




M* 


M 






"this report 


At3g59040.1 


PPR containing protein 


P 


C 


C (Zm") 




C 






"PPDB 


At3g60050 


PPR containing protein 


P 


M 






pM 










PPR containing protein 




M 


M (At B ) 




M 






Heazlewood et al 2004 


At3g60980 


PPR containing protein 


P 


H 






pM 








At3g61 170 


PPR containing protein 


PLS-E-DYW 


M 






pM 








At3g61360 


PPR containing protein 


P 


M 






pM 








Aio rt ci con 
Hijgo 1 DzU 


PPR containing protein 


P 


M 






pM 








At3g62470 


PPR containing protein 




M 






pM 








rM0yOZD<*U 


PPR containing protein 


p 








pM 








At3g62890 




PLS-E-DYW 


Q 




M/C* C b 








this report, in house SUBA3 


AE3g63370 


OTP86 


PLS-E-DYW 


c 




C* 


c 




Editing ips14 (C B ) 


"in house SUBA3. "Hammani et al 2009 


At4g01030 


PPR containing protein 


PLS-E-DYW 


M 


C (Zm a ) 










"PPDB 


At4g01400.1 


PPR containing protein 


P-D 


M 


PM (At ) 




pM 






"Mitra et al 2009 


At4g01570 


PPR containing protein 


P 


M 




c a 


c 






"this report 


At4g01990 


PPR containing protein 










pM 








At4g02750 


PPR containing protein 


PLS-E-DYW 


M 




M" 


M 






"Lurin et al 2004, "this report 


At4g02820 


PPR containing protein 


P 






M" 


M 






a Narsai et al 201 1 


At4g04370 


PPR containing. protein. 


PLS-E 


m 




M/C" 6 


M/C 






"this report, "in house SUBA3 


At4g04790 


PPR containing protein 


HHflfl 


H 






pM 








At4g08210 


PPR containing protein 


PLS-E 


m 




M a 


M 






"this report 


At4g1 1690 


PPR containing protein 


P 


m 




M" 


M 






"this report 


Ml**y 1 JoDU 


PPR containing protein 


rLo-t-U T VV 


M 






pM 








At4g 14050 


PPR containing protein 


PLS-E-DYW 


M 






pM 








At4g14170 


PPR containing protein 


PLS-E 


M 






pM 








At4g14190 


PPR containing protein 


HHH 


M 






pM 








At4g 14820 


PPR containing protein 


PLS-E-DYW 


c 




M" 


m 






"this report 


At4g 14850 


LOI1/MEF11 


PLS-E-DYW 


M 




M a M/C n 


Mb 




Editing cox3, nad4, ccb203 (M c ) 


Tang et al 2010, "this report, 
c Verbitskiy et al 2010 


At4g 15720 


PPR containing protein 


PLS-E-DYW 






C a 


c 






"this report 


rM**y 1 ujbU 


SVR7 /RNA binding P67 


P D 


c 


C (At . £.rr\ ) 


r*# 


c 






"AT_Chloro, "PPDB, turin et al 2004, 
"Liuetal 2010 


At4g 16470 


PPR containing protein 


PLS-E 


M 




M" 


M 






"this report 


At4g 16835 


PPR containing protein 


PLS-E-DYW 


m 






pM 








At4g 17616 


PPR containing protein 


P 


M 






pM 








At4g 17910 


PPR containing protein 




M 






pM 








At4g1 8520 


PDM1 






C(Zm") 








Processing rpoA transcript (C ) 


b 

PPDB, Haoetal20l0 


At4g 18750 


DOT4 


PLS-E-DYW 


c 


C(Zm') 


C b 


C 






"PPDB, "in house SUBA3 


At4g 18840 


PPR containing protein 


PLS-E 






c*" 


c 






"this report, "in house SUBA3 


At4g18975.1 


PPR containing protein 


HHMH 


c 






PC 








At4g19191 


PPR containing protein 


PLS-E 


M 






pM 








At4g 19220 


PPR containing protein 


PLS-E 


H 






pM 








At4g 19440 


PPR containing protein 


P 
















At4g 19900 


Glycosyl transferase- 
related 


P-D 


m 






pU 








At4g20090 


EMB1025 


P 


c 




M' 6 


M 


confirmed c 




a Lurtn et al 2004, "this report, c SeedGenes 


At4g20740 


EMB3131 


MRU 


c 


PM (At 8 ) 


C" 


c 


confinned c 




a Li et al 2012, "this report, c SeedGenes 


At4g20770 


PPR containing protein 


PLS-E 


M 






pM 








At4g21065.1 


PPR containing protein 


PLS-E-DYW 
















At4g21170 


PPR containing protein 


P 


M 




M/C" 


M/C 






a Narsaietal 2011 


At4g21190 


EMB1417 




U 


C(Zm a ) 




c 


confirmed" 




"PPDB, "SeedGenes 
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Table 2. Prediction and experimental localization data of Arabidopsis thaliana PPR proteins (continued) 











Localization 












AGI 


Gene Annotation 


Domains 

2 


Predictions 

3 


Proteomics 
4 


5 

Experimental 


Conclusion 
6 


EMB 7 


Molecular Function 
(localization) 8 


Koteren cos 


At4g21300 


PPR containing protein 


PLS-E 


M 


C (Zm") 


C b 


c 






a PPDB. "in house SUBA3 


At4g21705 


PPR containing protein 




M 






pM 








At4g21880 


PPR containing protein 


p 


M 




M" 


M 






"this report 


At4g21900 


PRORP3 


P D 






N' 


N 




Processing tRNA and 
maturation of RNA (N °) 


ar* u. j ..... 

Gobert et al 2010. Gutmann et al 2012 


At4g22760 


PPR containing protein 


PLS-E 






M" 


M 






"this report 


At4g25270 


OTP70 


PLS-E 


C 




C" 


C 




Splicing rpoC1 intron (C a ) 


B rhatAtr>nAr Pi-ti it in at al 0C\ 1 1 

L-naicignci-DOLJiin ei ai zu i i, 
"in house SUBA3 


At4g26680 


PPR containing protein 


P 


M 






pM 








At4g26800 


PPR containing protein 




M 






pM 








AI4g28010 


RFP5 


p 


M 




M" 


M 




processing nadfi, atp9 , 26S 
rRNA (M b ) 


'this report, b Hauler et al 2013 


At4g30700 


MEF29/ ZmPPR2263 


PLS-E-DYW 


M 




M/C" 


M/C 




Editing nad5, cob (M a ) 


"Sossoetal 2012 


At4g30825 


PPR containing protein 


P 




C (Zm") 




c 






B PPDB 


At4g31070 


PPR containing protein 










pM 








At4g31850 

Miig ma ou 
AI4g32450 


PGR3 
PPR containing protein 


P 

PLS-E 
PLS-E-DYW 


M 




C" 


C 
pM 




Translation stabilisation pelL 
and ndhA (C bc ) 


"Lurin et al 2004. b Yamazaki et al 2004, 
c Caietal 2011 




M" 




Editing nad5, nadB (M ) 


a in house SUBA3, Vervitskiy et al 2012 


At4g33170 


PPR containing protein 


PLS-E-DYW 


M 




M/C 


M/C 






"this report D in house SUBA3 


At4g33990 


EMB2758 


PLS-E-DYW 


M 






pM 


potential 3 




a SeedGenes 


At4g34830 


MRU 


P 


C 


C (At") PM (Af ) 




C 




Processing stabilisation rbcL 

(C - ) 


a PPDB, b AT_Chloro, 1i et al 2012, 
"Johnson et al 2010 


At4g35130 


PPR containing protein 


PLS-E-DYW 


C 




C" 


C 






a in house SUBA3 




















'Millar et al 2001, "Heazlewood et al 2004, 


At4g35850 


PPR containing protein 


P 


M 


M (At"", Os'l 




M 






' Klodmann et al 201 1 , "Taylor et al 201 1 , 
"Huang et al 2009 


At4g36680 


PPR containing protein 


P 


M 


M (At") 




M 






"Heazlewood et 2004, b Klodmann et al 201 1 


At4g37170 


PPR containing protein 


PLS-E-DYW 






M* 


m 






■this report 


A!4g37380 


PPR containing protein 


PLS-E-DYW 


C 






pC 








At4g38010 


PPR containing protein 


PLS-E 


M 






pM 








At4g38150.1 


PPR containing protein 


P 




M (At') 










"Taylor etal 2011 


At4g39530 


PPR containing protein 


PLS-E 


M 






pM 








At4g39620 
A!4g39952 
AI5g01 110 


EMB2453 /ZmPPR5 
PPR containing protein 
PPR containing protein 


P 

PLS-E 
P 


C 
M 

C 


C (Zm') 




C 
pM 

pC 


confirmed " 


splicing fmG (C c ) 


a PPDB, "SeedGenes, c Beick et al 2008 












At5g02830 


PPR containing protein 


P 


c 


C (Zm") 




C 






"PPDB 


At5g02860 


PPR containing protein 


P 


M 


C (Zm") 




c 


potential" 




a PPDB, "Myouga et al 2010 


At5g03560.2 


PPR containing protein 


P 


M 






pM 








At5g03800 


EMB175 


PLS-E-DYW 


C 


C (Zm") PM (At") 


M/C 1 " 


m/C 


confirmed 6 




"PPDB. "Keinath et al 2010, c in house SUBA3. 
"this report, "SeedGenes 


At5g04780 


PPR containing protein 


PLS-E-DYW 




C(At*) 




c 






"Kleffman et al 2004 


At5g04810 


ZmPPR4 


P-D 


c 


C (At") 


C b 


C 




Splicing rps12 intronl (C c ) 


"PPDB, "this report, 
c Schmitz-linneweber et al 2006 


At5g06540 


PPR containing protein 


rLo-t-UYW 


M 




C", M/C 


m/c 






°this report, "in house SUBA3 


AISg08310 


PPR containing protein 


PLS-E 


M 






pM 






this report 


Aiogueiyu 


SLG1 


PLS-E 


M 




M" 






Editing nad3 (M 8 ) 


"Yuan & Liu 2012 


A(Sg08510 


PPR containing protein 


PLS-E 






M" 


m 






'this report 


Al5g09450 


PPR containing protein 


P 


M 


M (At") 




M 






"Klodmann et al 2011 


At5g09950 


MEF7 


PLS-E-DYW 






M" 


M 




Editing nad2. nad4L. cob. 
ccb206 (M b ) 


"Lurin et al 2004, b Zehrmann et al 2012 


At5g10690 


PPR containing protein 


P-D 


- 


Ct (At") 


c" 


C 






"Ito et al 2011, "this report 


AISg11310 


PPR containing protein 


P 


M 






pM 








At5g12100 


PPR containing protein 


P 


M 


M (At") 




M 






"Tan etal 2009 


AI5g 13230 


PPR containing protein 


PLS-E-DYW 


M 




M" 


M 






"Lurin et al 2004 


At5g13270 


RARE1 


PLS-E-DYW 






c* 


C 




Editing accD (C b ) 


"Lurin et al 2004, b Robbins et al 2009 


At5g 13770 


PPR containing protein 


P 


C 


C (Zm") 




C 






"PPDB 


At5g 14080 


PPR containing protein 


■H8H 


C 


C(At") 


mlc" 


m/C 






'PPDB, "this report 


At5g 14770 


PPR containing protein 


p 


M 


PM (At") 


M 6 


M 






*Li et al 2012, 6 Lurm et al 2004 


At5g 14820 


PPR containing protein 


MHM 


M 






pM 








AI5g 15010 


PPR containing protein 


p 


C 






pC 








At5g 15280 


PPR containing protein 


p 


M 






pM 








At5g 15300 


PPR containing protein 


PLS-E 




PM (At") 










"Mitra et al 2009 


At5g 15340 


PPR containing protein 


PLS-E-DYW 


M 




M" 


M 






"Lurin et al 2004 


At5g 15980 


PPR containing protein 
PPR containing protein 
PPR containing protein 


P 

p 


M 
M 
M 


M (At") PM (At") 




M 
pM 
pM 






"Klodmann et al 201 1 , "Zhang et al 201 1 


At5g 16420 
At5g 16640 












At5g 16860 


PPR containing protein 


PLS-E-DYW 
















A!5g18390 


PPR containing protein 


P 


H 






pM 








At5g 18475 


PPR containing protein 




II 




M" 


M 






"this report 


At5g 18950 


PPR containing protein 


P 


M 




M' 


M 






"this report 


At5g 19020 


MEF18 


PLS-E 


C 






M 




Editing nad4 (M a ) 


"Takenaka et al 2010 


At5g21222 


AtC401 
PPR containing protein 
PPR containing protein 


P-D 
P 
p 




C(At") 


m/c' 


m/C 






"PPDB. "this report 


At5g24830 
At5g25630 




C(Zm) 










a PPDB 


At5g27110 


PPR containing protein 


PLS-E 


M 




M/C" 


m/c 






"in house SUBA3 


At5g27270 


EMB976 
PPR containing protein 
PPR containing protein 


P 

nisi 

P 


c 

M 
M 


C (Zm') 


c" 


c 

pM 
pM 


potential 0 




"PPDB. "this report. c SeedGenes 


At5g27460 
At5g28460 












AI5g36300 


pseudogene 


■HAH 
















At5g37570 


PPR containing protein 


PLS-E 


M 






pM 








At5g38730 


PPR containing protein 


P 


M 






pM 








At5g39350 


PPR containing protein 


PLS-E 


M 




M* 


M 






"in house SUBA3 


At5g39680 


EMB2744 


PLS-E-DYW 


HH 




M" 


m 


potential 0 




"Lurin et al 2004, "in house SUBA3, c SeedGenes 


At5g39710 


EMB2745 


P 


M 




M" 


M 


potential 0 




"Narsai et al 201 1 . "SeedGenes 


At5g39980 
At5g40400 


EMB3140 
PPR containing protein 


P 
P 




C (Zm") 




c 


confirmed " 




"PPDB, "SeedGenes 


At5g40405 


PPR containing protein 


PLS-E-DYW 






M" 


m 






"this report 


At5g40410 


PPR containing protein 


PLS-E-DYW 


m 


C(Al") 




c 






a AT_Chloro, "Kong et al 201 1 


At5g41170 


PPR containing protein 


■m 


M 






pM 








At5g42310 


Ortholog of Z Mays 
CRP1 


p 


M 


C(At") 




C 




Translation stabilisation petA 
and psaC (C bc ) 


"PPDB, "Fisk etal 1999, 
c Schmitz-linneweber et al 2005 


At5g42450 


PPR containing protein 


PLS-E 
















At5g43790 


PPR containing protein 


PLS-E 






m/c" 


m/c 






'this report 


At5g43820 


PPR containing protein 


P 


M 






pM 
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Table 2. Prediction and experimental localization data of Arabidopsis thaliana PPR proteins (continued) 



Localization 



AGI 


Gene Annotation 1 


Domains 

2 


Predictions 

3 


Proteomics . 
4 Experimental 


Conclusion 

6 


EMB 7 


Molecular Function 
(localization) 6 


References 


At5g44230 


PPR containing protein 


PLS-E-DYW 


C 




PC 








At5g46100 


PPR containing protein 




M 




pM 








At5g46460 


PPR containing protein 


PLS-E-DYW 


M 




pM 








At5g46580 


PPR containing protein 


P-D 


C 


C (At" b , Zm") 


C 






a AT_Chloro. b PPDB 


At5g46680 


PPR containing protein 


P 


M 


PM (At a ) M° 


M 






a Li etal2012, "this report 


At5g47360 


PPR containing protein 


P 


M 




pM 








At5g47460 


PPR containing. protein 


PLS-E 


M 


M/C a 


m/c 






"this report 


At5g48730 


PPR containing protein 


HHflfl 


c 


C (Zm a ) 


C 






a PPDB 


At5g48910 


LPA66 


PLS-E-DYW 


c 


C", M 6 


m/C 




Editing psbF (C c ) 


'this report, b in house SUBA3, c Cai et al 2009 


At5g50280 


EMB1006 


|> 


c 


C (Zm a ) PM (At 6 ) 


C 


potential c 




"PPDB, "Mitra et al 2009, c SeedGenes 


At5g50390 


EMB3141 


PLS-E-DYW 


c 


C a 


C 


potential" 




"in house SUBA3, "SeedGenes 


At5g50990 


PPR containing protein 


PLS-E-DYW 














AI5gS2630 


MEF1 


PLS-E-DYW 


c 


c a , M 6 


M 




Editing rps4, nad7, nad2 (M c ) 


a Lurin et al 2004, b this report, 
c Zehrmann et al 2009 


At5g52850 


PPR containing protein 


PLS-E-DYW 














At5g55740 


CRR21 


PLS-E 


c 


M',C B 


C 




Editing ndhD (C e ) 


"Lurin et al 2004. "in house SUBA3, 
c Okuda et al 2007 


At5g55840 


PPR containing protein 






M" 


M 






"this report 


At5g56310 


PPR containing protein 


PLS-E 


M 


M" 


M 






"this report 


At5g57250 


PPR containing protein 


p 


M 




dM 








At5g59200 














Editing rpl23 (C ) 


in house SUBA3, Hammani et al 2009 


At5g59600 


PPR containing protein 


PLS-E 




& 


c 






"this report 


A!5g59900 


PPR containing protein 


P 




M* 


M 






"this report 


At5g60960 


PNM1 


P 


M 


M <0s") M/N b , (Uf 


M/N 


confirmed 1 ' 




Huang et al 2009, Hamanni et al 201 1 , 
c Narsai et al 2011 


At5g61370 


PPR containing protein 


P 


M 


M" 


M 






"Narsaietal 2011 


At5g61400 


PPR containing protein 


P 


M 




pM 








At5g61800 


PPR containing protein 


PLS-E 


M 


PM (At a ) 


pM 






"Li et al 2012 


At5g61990 


PPR containing protein 


P 


M 




pM 








At5g62370 


PPR containing protein 


P 


M 




pM 








At5g64320 


PPR containing protein 




H 




pM 








At5g65560 


PPR containing protein 


p 


M 




pM 








At5g65570 


PPR containing protein 


PLS-E-DYW 


m 


PM (At 8 ) 


pM 






"Mitra et al 2009 


At5g65820 


Zmempp4 ortholog 2 


P 


M 


M (Zm a At 0 ) 


M 






"Gutierrez-marcos et al 2007, "this report 


At5g66500 


PPR containing protein 


PLS-E 


H 




pM 








At5g66520 


CREF7 


PLS-E-DYW 




C 


C 




Editing ndhB (C°) 


"this report, b Yagi et al 2013 


At5g66631 


PPR containing protein 




c 




PC 








At5g67570 


EMB1408/DG1/ 
ZmPPR8852 


p 




C (Zm a ) C b 


c 






3 PPDB, "Chi et al 2008 



(1) Functional annotations were obtained from TAIR web site using the Arabidopsis Genome Initiative (AGI) genome release ver10. AB05, ABA OVERLAY- 
SENSITIVE; AtECB, EARLY CHLOROPLAST BIOGENESIS; BIR, BSO-INSENSITIVE-ROOTS; CLB, CHLOROPLAST BIOGENESI; CREF, CHLOROPLAST RNA EDITING 
FACTOR; CRR, CHLORORESPIRATORY REDUCTION; DG, DELAYED GREENING; DOT, DEFECTIVELY ORGANIZED TRIBUTARIES; EMB, EMBRYO DEFECTIVE; 
FAC, EMBRYONIC FACTOR; GRP, GLUTAMINE-RICH PROTEIN; GUN, GENOME UNCUPLED; HCF, HIGH CHLOROPHYLL FLUORESCENCE; LOI, LOVASTATINE 
INSENSITIVE; LOJ, LATERAL ORGAN JUNCTION; LPA, LOW PSII ACCUMULATION; MEF, MITOCONDRIAL RNA EDITING FACTOR; MPR25, MITOCHONDRIAL PPR 
25; MTSF, MITOCHONDRIAL STABILITY FACTOR; NFD, NUCLEAR FUSION DEFECTIVE; OGR1, OPAQUE AND GROWTH RETARDATION; OTP, ORGANELLE TRAN- 
SCRIPT PROCESSING, PDE: PIGMENT DEFECTIVE; PDM, PIGMENT DEFICIENT MUTANT; PGN, PENTATRICOPEPTIDE GERMINATION ON NaCI; PGR, PROTON 
GRADIENT REGULATION; PNM, PROTEIN LOCALIZED TO THE NUCLEUS AND MITOCHONDRIA; PPR, PENTATRICOPEPTIDE REPEAT; PRORP, PROTEINACEUS 
RNASE P; PTAC, PLASTID TRANSCRIPTIONALLY ACTIVE; REME, REQUIERED FOR EFFICENCY OF MITOCHONDRIAL EDITING; RPF, RNA PROCESSING FACTOR; 
SLG, SLOW GROWTH; SVR, SUPRESSOR OF VARIEGATION; VAC, VANILLA CREAM; YS, YELLOW SEEDLING; Zmempp, Z. mays EMPTY PERICARP, ZmPPR, 
Zea mays PPR. (2) PPR domains were recovered from FLAGdb ++ v5 (http://urgv.evry.inra.fr/projects/FLAGdb++/HTML/index.shtml) and from manually 
curated published evidences. Domain identifiers are according to Lurin and co-workers:" "P" for PPR P-type domains, "P-D" for PPR P-type with additional 
atypical domain, "PLS" for PPR PLS-type domains, "PLS-E" for PPR PLS-type with an E- or EE+- type additional domain, and "PLS-E-DYW" for PPR PLS-type 
containing EE+ and DYW additional domains. (3) Localization predictions were aggregated from the independent predictions provided by the follow- 
ing software: Predotar v1.03, TargetP server vl.1, iPSORT, Multi Loc, LocTree, and AtsubP server with the complete Arabidopsis proteome using default 
settings. The rules to propose a conclusive prediction were as follows: if four or more software give the same prediction, this prediction is proposed and 
noted in uppercase; if three software give the same prediction and the three others do not predict any localization, the prediction is proposed and noted 
in uppercase; if two software give the same prediction and the four others do not predict any localization, the prediction is proposed and noted in low- 
ercase; if three software give the same prediction and another predict a different localization, the main prediction is proposed and noted in lowercase; in 
the other cases, no prediction is proposed (-). (4) Proteomic localizations were gathered from published studies and from organelle proteomic databases 
as indicated in corresponding references in the last column of the table. Additional information in brackets states in which specie(s) the proteomic 
investigation was (were) performed: "At" stands for Arabidopsis thaliana, "Zm" for Zea mays, and "Os" for Oriza sativa. (5) Experimental localizations of 
fluorescent proteins were collected from targeted published studies and systematic approaches," ,57 this report, unpublished data from SUBA3 either 
using targeting peptides or full-length proteins. (6) Conclusion column gives a probable subcellular localization by integrating prediction, proteomic, 
genetics, and fluorescent proteins data. The decision rule is as follows: reverse genetics is prevalent followed by fluorescent proteins, proteomic data, 
and prediction. The conclusion is indicated in uppercase if reverse genetics data is available, if two experimental results are identical, or if the experi- 
mental data fit with the prediction. If not, the conclusion is indicated in lowercase. If only predictions are available, the predicted localization is indicated 
with a preceding "p". (7) Data of PPR Embryo defective mutants (EMB) was obtained from SeedGenes database (http://www.seedgenes.org/index.html) 
and manually curated mutants from published studies. (8) Molecular function based on reverse genetics approaches were obtained from literature, the 
localization of the molecular function is indicated in brackets. Localization data is indicated as followed. M, mitochondria; C, chloroplasts; N, nucleus; V, 
vacuole; Ct, cytosol; PM, plasma membrane. N/Ct, nucleus and cytoplasm; M/C, mitochondria and chloroplasts; lower case, "probably"; "pX", predicted in 
compartment X (conclusion column). 
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Figure 1. Examples of typical sub-cellular localizations observed using confocal microscopy. Confocal images of protoplasts obtained from Nicotiana 
benthamiana leaves infiltrated with constructs contai ning (A) the first 300 bp, coding for the first 1 00 amino acids, of six PPR ORFs, fused to the RFP 
coding sequence (TP-RFP) or (B) the full-length sequence of four PPR ORFs fused to the RFP coding sequence (FL-RFP). The RFP fluorophore (in red), 
the MitoTracker Green staining (in green), and the chlorophyll autofluorescence (in blue) were simultaneously visualized. Overlay panels show com- 
bined fluorescence from RFP, MitoTracker, and chlorophyll autofluorecence. Loc, deduced subcellular localization; M, mitochondria; C, chloroplasts; N/ 
Ct, nucleus and cytosol; M/C, mitochondria and chloroplasts. Bars: 10 jjim. 



with 17 out of 18 (94%) and 53 out of 57 (93%) compatible 
localization, respectively (Table 3). 

As concluded in Table 2 and taking into account all the above 
depicted approaches, we assigned all Arabidopsis PPR proteins 
a probable localization depending on the strength of the avail- 
able data. The localization based on reverse genetics, when avail- 
able, prevailed over any other approaches. Because we showed 
that the experimental localizations of fusion proteins were highly 
correlated with the localization of the molecular function when 
identified (Table 3), this data prevailed over the proteomics and 
bio-informatics ones. Additionally, PPR protein identification in 
organellar proteomes, though showing some discrepancies with 
functional data suggesting some errors of localization linked to 
this technique, was as far as we know more trustable that bio- 
informatics predictions. Finally, when no experimental data was 
available, we proposed a predicted localization in mitochondria 
or chloroplast (pM or pC). Figure 2 gives a graphical view of 
these results. The number of PPR proteins with a suspected or 



proved subcellular localization in at least one of the two organ- 
elles increased significantly with our study. For example, the 
experimental mitochondrial and chloroplast localization data 
increased by 50% (from 134 to 212) and with the addition 19 
PPRs with experimental dual targeting to mitochondria and 
chloroplast to the previously 10 known. Overall, 275 PPR pro- 
teins (60%) are expected to function in mitochondria, with 44% 
of them being validated in experimental studies. Additionally, 
109 PPR proteins (24%) are expected to function in plastids, 
82% being demonstrated experimentally. Forty-five PPR proteins 
(10%) are suspected to have a dual addressing to both plastids 
and mitochondria. Finally, five PPR proteins have been shown to 
have atypical localization: PROPR2 and PROPR3 were shown to 
be addressed to the nucleus, 58 PNM1 and GRP23 to both nucleus 
and mitochondria, 27,28,57 and AT3G53170 was observed in both 
nuclear and chloroplastic extracts during proteomics studies. 48,59 
Only 24 PPR proteins (5%) do not have any clear localization 
based on experimental or bio-informatics reported investigations. 
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Table 3. Correlations between localization data sets 



Data sets 

(number of PPR proteins with data in this set) 


Fusion proteins 


This study (126) 


All data (208) 


Proteomics (84) 


Reverse genetics (68) 


Predictions (377) 


70/87 (80%) 


135/159 (85%) 


48/68 (71%) 


44/55 (80%) 


Reverse genetics (68) 


17/18 (94%) 


53/57 (93%) 


12/15 (80%) 




Proteomics (84) 


15/19(79%) 


30/36 (83%) 







In each cell of the table, the number and the percent of compatible localizations among the intersection of data available in both data sets are indi- 
cated. Two results are considered as compatible when their localizations are coherent: for example, experimental localization in both organelles and 
prediction or proteomics indicating only one of the two organelles. 



Discussion 

RFP fusions with PPR-targeting peptides allowed us to study 
the subcellular localization of many members of the PPR 
family. Our aim in this study was to clarify the subcellular 
localization of 166 members of the large PPR family selected 
to have ambiguous localization predictions when we started the 
approach. In order to determine this, we used a strategy of high- 
throughput gateway cloning of the first 300 bp of PPR ORFs 
(corresponding to the N-terminal 100 amino acids of proteins) 
combined to a systematic microscopy investigation of the local- 
ization of transiently expressed RFP-tagged proteins. When 
it was determined that the first 100 amino acids displayed an 
interesting localization pattern, we performed in a second step 
a similar study using the whole ORF. Overall, with this work, 
we provided experimental information on the localization of 
131 PPR proteins. 

We have shown that 129 PPR proteins have functional tar- 
geting peptides able to address the RFP protein in one or both 
organelles. Seventeen have been previously published in dedi- 
cated studies and were shown to localize in agreement with 
our systematic results (Table 2 and 3), 19 » 21 - 60 - 71 Additionally, 15 
PPR proteins (HCF152 and OTP51 included) were identified in 
the same compartment using untargeted proteomic approaches 
(Table 2 and 3). 44 > 59 > 72 -7 6 These independent localization results 
largely validate our systematic strategy. 

The strategy we used to study the localization of proteins can 
be performed at large scale to provide rapid functional informa- 
tion for organellar proteins. Nonetheless, some limitations have 
to be kept in mind when considering the results: first of all, the 
use of Nicotiana benthamiana is convenient as leaves are very 
comfortable to work with, but the evolution of addressing signals 
might be slightly different in distinct dicotyledonous species, 
explaining some discrepancies in the results. Second of all, the 
agro-infiltration to transform plant cells and generation of pro- 
toplasts to visualize expression are two steps known to generate 
stresses which, in some cases, may affect the conclusions. At least, 
the use of the very strong 2X35S promoter to trigger chimerical 
protein expression may overwhelm the translation and import 
machineries, leading to erroneous localization. However, the low 
number of discrepancy cases between our results and published 
information gained using a very large set of techniques largely 
validate our strategy and strengthen our results (Table 3). 

Most discrepancies between our work and previous experi- 
mental localizations concern dual-localized proteins. Four of our 



dual-localized candidates (EMB175, AT5G14080, AT1G64100, 
AtC401) were previously shown in a single organelle using pro- 
teomic approaches 74 PPDB. Similarly, MEF11, and AHG11 were 
functionally characterized in mitochondria editing, 77 " 79 and 
AT3G62890-GFP fusion was previously observed in plastids 
in house SUBA3, whereas our results suggested a dual localiza- 
tion in both organelles for these three proteins. In contrast, three 
PPR proteins (AT2G37230, AT3G15130, AT5G06540) are sus- 
pected to have a dual localization because of proteomics results 
PPDB, 42,44 or expression of fusion proteins (unpublished result 
from SUBA3) , and were observed only in one of the two organelles 
in our study. Finally, five proteins previously observed in plastid 
extracts (AT1G09900, AT1G19720, AT2G28050, AT3G01580) 
or shown to be involved in plastid editing (AT3G14330) were 
observed in mitochondria in our study. Without any functional 
characterization, these differences cannot be definitively solved. 
Erroneous dual localization based on RFP-fusion localization 
could be explained by artifacts triggered by overexpression, 
whereas erroneous dual localization based on proteomics experi- 
ments could be due to sample contaminations. On the other 
hand, erroneous single localization might be common because 
of limitation in protein detection in one of the compartments 
during proteomics or microscopy experiments. The functional 
characterization of a protein in one of the two organelles does 
not refute the localization in the other one. Due to these experi- 
mental detection limitations, as well as the fact that we believe 
that dual-localized PPR proteins are mostly underestimated (see 
below), we have tentatively concluded that these 14 PPR proteins 
are localized in both organelles. 

During this work, we did not observe the nuclear localiza- 
tion of GRP23 published by Ding and co-workers; 27 however, 
we did observe a mitochondrial localization of the TP fused to 
RFP, as described previously by Narsai and co-workers. 57 The 
GRP23 Nuclear Localization Signal, located at position 99—108, 
was not included in the 100 amino acid fragment used in our 
experiments. 27 Taken together, these results suggest that GRP23, 
as PNM1, may localize in both mitochondria and nucleus. 

Addressing of PPR proteins to both organelles is under- 
estimated. We identified 19 new PPR proteins that could have 
a role in both organelles. Integration of proteomic data and 
previous fluorescent subcellular localization studies suggest 
that overall at least 45 PPR proteins could be dual targeted. 
Recently, about 100 nuclear-encoded proteins were shown to 
be targeted to both mitochondria and plastids. 80 They are pro- 
posed to code for important cellular housekeeping activities. In 
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addition, a study showed that in many cases, the dual targeting 
of proteins is conserved in three distant Viridiplantae species, 81 
allowing to assume that some PPR proteins could have the same 
dual localization in several species and probably with related 
functions. 

Among the PPR family, five proteins were published to be dually 
addressed into mitochondria and plastids. 57,58 ' 69,82 The two ortho- 
logs, PPR2263 of maize and MITOCHONDRIAL EDITING 
FACTOR29 of Arabidopsis (included in our study), were shown 
to localize mainly in mitochondria, in which they edit nad5 
and cob transcripts, but also in plastids, in which their function 
remain to be elucidated. 69 Four other PPRs (PRORP1, OTP87, 
AT1G06270, AT4G21170) were not assayed in our investigation 
because their predicted localizations were not ambiguous accord- 
ing to our criteria. AT1G06270 and AT4G21170 are uncharacter- 
ized P-type PPR proteins shown as dually localized by Narsai and 
co-workers. 57 PROTEINACEOUS RNASE P 1 (PRORP1) was 
the first PPR protein shown to be dually addressed. 58 PRORP1 
is an atypical PPR protein composed of 5.5 consecutive PPR 
repeats linked to a carboxyl-terminal (C-terminal) metallonucle- 
ase domain by a structural zinc-binding domain. 83 This protein 
is responsible for the nucleolitic maturation of tRNAs, an activ- 
ity required in both organelles. By the use of targeting peptides 
fused to GFP protein, three proteins (OTP87, AT1G06270, 
AT4G21170) were also found in both organelles. 57 ' 82 OTP87 
is an essential PPR protein required for RNA editing of mito- 
chondrial nad7 and atpl transcripts in A thaliana. However, the 
depletion by an antisense strategy of OSPPR1, the ortholog of 
OTP87 in O. sativa, was described to affect the chloroplast bio- 
genesis. 84 The predictions of localization corresponding to these 
five dual-localized proteins are either mitochondrial or plastidial 
(Table 2). Similarly, among 45 PPR proteins suspected to be 
localized in both organelles, eight are predicted in chloroplasts, 
28 in mitochondria, and only nine do not have any predicted 
subcellular localization (Table 2). This suggests that many dual- 
targeted PPR proteins might be still unidentified. In particular, 
we suspect that many might be included in the 172 PPR proteins 
having a clear localization prediction in one of the two organ- 
elles. Moreover, although different mechanisms of dual targeting 
exist in the plant cell, 85 the current information does not help to 
hypothesize by which mechanism PPR proteins could be dual 
targeted, preventing the predictions of these dual localizations. 

Dual targeting to mitochondria and chloroplast is an emerg- 
ing class of localization in the plant cell and the PPR family 
seems to have an important contribution. Taking into account 
the functions of PPR proteins in RNA editing, RNA process- 
ing, and translation, this type of localization in the PPR family 
is not surprising and could be seen as a way to control or coor- 
dinate organelle RNA metabolism. 86,87 However, this hypothesis 
requires testing because, until now, only one PPR protein has 
been shown to function in both organelles. 58 The analysis of 
domains in a PPR protein could help to infer its putative func- 
tion. PPR proteins with dual localization seem to be present in 
all types of functional categories. However, among 45 dual-local- 
ized PPR proteins, 31 belong to the PPR-PLS subclass showing a 
probable overrepresentation of this subclass in the dual-targeted 




PPR proteins. Nevertheless, it is important to note that the local- 
ization of many PPR-P proteins (115) were not characterized yet, 
probably biasing this observation. 

PPR proteins localized out of organelles seem to represent 
atypical examples in the family. Using the first 300 bp, we also 
identified nine PPR proteins potentially addressed out of the 
organelles, i.e. giving a nuclear and cytosolic localization. None 
were confirmed using the whole ORFs (Table 1, Fig. 1). This 
suggests that the number of PPR proteins being out of organ- 
elles is smaller than we thought when this work was initiated. 
In total, less than 1-2% of PPR proteins could function in the 
cytoplasm and/or the nucleus (Fig. 2). This value may be still 
overestimated as the model gene loci are sometimes miss-pre- 
dicted, in particular, concerning the initiation codon. This may 
also suggest that the correct targeting sometime needs a peptide 
longer than the 100 amino acids we used for our work. Huang 
and co-workers showed that the length of mitochondrial pre- 
sequence varied greatly from 19-109 amino acids. 36 For GRP23, 
the beginning of the NLS signal has been located at the amino 
acid 99. Using the first 100 amino acids, we observed RFP sig- 
nal into mitochondria (as previously described by Narsai and 
coworkers 57 ) whereas the full-length protein localizes in the 
nucleus. 27 This findings confirm that systematic localization 
using the whole proteins could give more accurate information 
on PPR localizations. 

The case of PNM1 is even more complicated. The PNM1 
nuclear localization is controlled by a NLS sequence in the C 



Figure 2. Distribution of the localization of Arabidopsis thaliana 
PentatricoPeptide Repeat (PPR) proteins. Classes of localization and 
percentage of each class in the PPR family are shown. pM, predicted mi- 
tochondria localization in dark red; M, mitochondria localization in light 
red; pC, predicted plastid localization in dark green; C, plastid localiza- 
tion in light green; M/C, mitochondria and plastid localization in yellow; 
N/C, nuclear and chloroplastic localization in black; IWN, mitochondria 
and nuclear localization in blue; N, nuclear localization in pink, unclear 
localization in light gray. 
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terminus of the protein 82 but the whole protein is addressed to 
mitochondria. The nuclear localization was only obtained with 
a truncated form of the protein without the predicted targeting 
peptide fused with the reporter fluorescent protein. This nuclear 
localization was confirmed using a specific antibody. The mean- 
ing of such a complex addressing system is still a matter of debate 
but suggests that a few very interesting PPR could be involved in 
signaling between organelles and nucleus. 86 

Materials and Methods 

Bioinformatic predictions and data collection. Subcellular 
localization prediction of the PPR proteins were performed using 
TargetP server (http://www.cbs.dtu.dk/services/TargetP/) (ver- 
sion 1.01 was used when we initiated this work to select the 166 
PPRs and version 1.1 was used when we built Tables 1 and 2), 
Predotar vl.03 (http://urgi.versailles.inra.fr/predotar/predotar. 
html), iPSORT (http://ipsort.hgc.jp/), Loctree (https://www. 
rostlab.org/owiki/index.php/Loctree), Multiloc (http://abi.inf. 
uni-tuebingen.de/Services/MultiLoc/), and AtSubP (http:// 
bioinfo3. noble. org/AtSubP/?dowhat=About) software using 
default setting. Proteomic data was recovered from published 
proteomic references and subcellular proteome databases: 
PPDB (Plant Proteome Database http://ppdb.tc.cornell.edu/), 43 
SUBA3 (Subcellular location database for Arabidopsis proteins 
http://suba.plantenergy.uwa.edu.au/), 5 and AT_CHLORO 
(http ://www.grenoble. prabi.fr/ at_chloro/) . 42 

Subcellular localization of proteins. The first 100 codons or 
the whole PPR ORFs were PCR amplified from Arabidopsis thali- 
ana (ecotype Columbia-O) genomic DNA or cDNA using 
iProof DNA polymerase (Bio-Rad), specific primers (listed in 
Table SI) and a two-step amplification protocol as described 
previously. 11 PCR products were recombined into pDONR207 
(Invitrogen) using Gateway® BP Clonase® II Enzyme mix 
(Invitrogen) as described. 11 For microscopic investigation, LR 
recombination reactions were performed using Gateway® LR 
Clonase® Enzyme Mix (Invitrogen) in order to transfer PPR 
sequences from Entry vectors to the pGREENII-derived desti- 
nation vector p0229-RFP2" allowing C-terminal translational 
fusion with the RFP protein under the control of the 2X35S 
promoter. The proper ORF fusion was confirmed by sequencing 
using P35STL (5'-CGAATCTCAA GCAATCAAGC-3') and 
RFP2rev (5-TGAACTCGGT GATGACGTTC-3') primers. 

Binary vectors were introduced into thermo-competent 
Agrobacterium tumefaciens strain C58C1 harboring the helper 
plasmid pSOUP. 88 A single resistant colony was then used to 
inoculate 5 mL of Luria Bertani medium supplemented with 
5 mg L" 1 Tetracycline, 50 mg L" 1 Kanamycine, and 2.5 mg L" 1 
Rifampicine. This overnight pre-culture was then diluted 10 
times and further grown overnight in similar conditions. After 
centrifugation, Agrobacterium cells were re-suspended in agro- 
infiltration buffer (10 mM MES/KOH pH 5.6, 10 raM MgCl 2 , 
150 u,M 3,5'-Dimethoxy-4'-hydroxyacetophenone -Sigma- 
Aldrich-) with a final OD 600 between 0.2-0.3, and incubated 
at room temperature for 2 h. Agrobacterium suspensions were 



infiltrated using 1 mL syringes without needle in leaves of 
Nicotana benthamiana. 

Protoplasts were prepared from leaf material (harvested 
48-96 h after infiltration), cut into thin strips, and incu- 
bated in enzyme solution containing 4.3 g.L" 1 Murashige and 
Skoog Basal Salt Mixture (ICN Biomedicale), 0.5 g.L" 1 MES, 
20 g.L" 1 sucrose, 80 g.L" 1 mannitol, KOH to pH 5.6, 0.4 g.L" 1 
Pectinase from Rhizopus sp. (Sigma-Aldrich), 1 g.L" 1 Driselase® 
Basidiomycetes sp. (Sigma-Aldrich) and 2 g.L" 1 Cellulase 
Onozuka RS from Trichoderma viride (SERVA Electrophoresis 
GmbH) at 28 °C for 2-4 h. 89 Protoplasts were observed using an 
Eclipse TE2000S inverted microscope (Nikon) and RFP signal 
monitored using a custom filter block (exciter HQ546/12, emit- 
ter HQ605/75, beam-splitter Q560lp; Chroma Technology). 
For each construction, at least three independent agro-infil- 
trations were realized and each of them was observed inde- 
pendently by two of the authors. To confirm mitochondrial 
localizations, protoplasts were stained with 1 (jlM MitoTracker 
Green (Invitrogen) for 15—30 min. For confocal microscopy, 
proteins were visualized using a spectral Leica SP2 AOBS confo- 
cal microscope (Leica Microsystems) equipped with argon and 
HeNe lasers. Fluorescent signals were detected with a sequential 
configuration using a 488 nm laser line (MitoTracker Green: 
excitation/emission 488/510-530 nm) and a 543 nm laser line 
(RFP: excitation/emission 543/570—600 nm and chlorophyll 
autofluorescence: excitation/emission 543/600-700 nm). The 
images were coded red (RFP), green (MitoTracker Green), and 
blue (chlorophyll autofluorescence), giving yellow co-localiza- 
tion in mitochondria when green and red signals overlap in 
merged images and violet co-localization in plastid when blue 
and red signals overlap. Microscopic observations were per- 
formed using a Leica HCPL APO 633/1.20 Water Corr/0.17 
Lbd.BL objective. Each image shown represents the projection 
of optical sections taken as a Z series. 
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